About This Project

In this project, I analyzed real online retail transaction datasets to efficiently manage a warehouse inventory.

Why Inventory Management?

You might wonder, “Why inventory management?” Well, have you ever been in a situation where you were so excited to buy specific goods at a store, only to find out that the store had run out of stock? Now, that would be very disappointing and your image of the store might not be as good as it was before. For a retail company, there are two potential downsides to such situation. One, the company loses a sale. And two, the company risks losing a valued customer. Therefore, it is important that the company has enough stock to meet customers’ needs. However, on the other hand, the company should not have too many items in stock because it can be unnecessary and expensive and things can go wrong with those items. This is why doing data analysis on inventory is necessary for effectively running a company. By conducting data analysis, companies can make data driven decisions on how many items they should prepare in stock.

So how are we going to conduct data analysis to make sure that the online retail company has the right amount of items in stock? We are going to use the company’s historical transaction data to figure out the most optimal amount of items to place in stock.

Get the Historical Data

First, we need to get the company’s historical transaction data from the company’s database. We are going to use 2 years worth of data from 2009 to 2011. Ideally, you would want to use dataset that is more relevant. But for this project, we will use this dataset from 2009 to 2011.

Raw data

Examine and Wrangle the Data

Before changing the data format and structure, we need to ask ourselves what tools we are going to use for the analysis, so that we can tailor the dataset specifically for those tools. Since the dataset exceeds 1 million rows, I decided to work mostly in SQL and R. Let’s wrangle.

I got rid of irrelevant columns of data for this analysis, such as “Invoice” number and “Customer ID.”

Then I transformed “InvoiceDate” column into two columns “Year” and “Month”, using Excel’s YEAR and MONTH functions

Examining the dataset using filter function in Excel, I discovered that there were negative values in “Quantity” column.

Negative quantity values

This was peculiar, but soon I realized upon close inspection that the rows with negative quantity values were either entry errors or records of unsaleable items. Because these rows were irrelevant to the analysis and would obscure the result of the analysis if not taken care of, I got rid of them.

Additionally, I replaced “£” sign with “Pound” by using Find & Replace function, so that data is more readable by softwares in general.

Finally, I decided to divide the dataset into two different datasets, one with “StockCode” and “Description”, and another with all the information but “Description.” I separated the dataset so that later I could create two tables and work more flexibly in SQL using JOIN function. The cleaned datasets looked like this:

Cleaned dataset


In order to analyze the data using SQL, I exported the cleaned datasets into csv files. Then, I imported the csv files into the DBMS. Make sure to import the 2009-2010 dataset AND 2010-2011 dataset!

CREATE SCHEMA inventory_management;

CREATE TABLE inventory_management.transaction_data (
    StockCode varchar(50) NULL,
    Quantity integer NULL,
    Price real NULL,
    Country varchar(50) NULL,
    Year integer NULL,
    Month integer NULL
);

COPY transaction_data (
    StockCode,
    Quantity,
    Price,
    Country,
    Year,
    Month
)
FROM '/Users/bruce/Desktop/case_studies/retail/archive/2010_2011.csv'
DELIMITER ','
CSV HEADER; --repeat this process for 2010-2011 datset!

CREATE TABLE inventory_management.stockcode (
    StockCode VARCHAR(50) NULL,
    Description VARCHAR(100) NULL
);

COPY stockcode (
    StockCode,
    Description
)
FROM '/Users/bruce/Desktop/case_studies/retail/archive/stockcode.csv'
DELIMITER ','
CSV HEADER;

Analyze the Data

Now it’s time to analyze the data that we’ve cleaned! The main purpose of data analytics in inventory management is to figure out how many items in stock is good amount to efficiently server customers in the future. To do this, we will create an inventory list from the past to get a good estimate of how many items to stock in the warehouse.

Let’s get an inventory list from 2010, segmenting it by country.

SELECT transaction_data.stockcode, stockcode.description, country, sum(quantity) AS quantity
FROM inventory_management.transaction_data
LEFT JOIN inventory_management.stockcode
ON transaction_data.stockcode = stockcode.stockcode
WHERE stockcode.description IS NOT NULL AND YEAR = 2010
GROUP BY transaction_data.stockcode, stockcode.description, country
ORDER BY stockcode, country;
Here is a snapshot of the list in Excel format. You can download the list here.


Now, let’s analyze a little deeper. We can take a look at one of the most popular items on the list, JAZZ HEARTS SPIRAL NOTEPAD and see how it is distributed by Month and Country.

SELECT transaction_data.stockcode, stockcode.description, country, sum(quantity) AS quantity, month
FROM inventory_management.transaction_data
LEFT JOIN inventory_management.stockcode
ON transaction_data.stockcode = stockcode.stockcode
WHERE stockcode.description = 'JAZZ HEARTS SPIRAL NOTEPAD' AND YEAR = 2010
GROUP BY transaction_data.stockcode, stockcode.description, country, month
ORDER BY month;
Displaying records 1 - 10
stockcode description country quantity month
84077 JAZZ HEARTS SPIRAL NOTEPAD Spain 96 1
84077 JAZZ HEARTS SPIRAL NOTEPAD United Kingdom 1347 1
84077 JAZZ HEARTS SPIRAL NOTEPAD EIRE 96 2
84077 JAZZ HEARTS SPIRAL NOTEPAD France 48 2
84077 JAZZ HEARTS SPIRAL NOTEPAD Sweden 288 2
84077 JAZZ HEARTS SPIRAL NOTEPAD United Kingdom 1968 2
84077 JAZZ HEARTS SPIRAL NOTEPAD France 48 3
84077 JAZZ HEARTS SPIRAL NOTEPAD United Kingdom 4515 3
84077 JAZZ HEARTS SPIRAL NOTEPAD Sweden 576 4
84077 JAZZ HEARTS SPIRAL NOTEPAD United Kingdom 3168 4

We can use Tableau’s heatmap to see how this product is distributed by Month and Country.